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Abstract The effectiveness of twenty public search engines is 
evaluated using TREC-inspired methods and a set of 54 queries 
taken from real Web search logs. The World Wide Web is taken 
as the test collection and a combination of crawler and text 
retrieval system is evaluated. The engines are compared on a 
range of measures derivable from binary relevance judgments of 
the first seven live results returned. Statistical testing reveals a 
significant difference between engines and high Intercorrelations 
between measures. Surprisingly, given the dynamic nature of the 
Web and the time elapsed, there is also a high correlation 
between results of this study and a previous study by Gordon 
and Pathak. For nearly all engines, there is a gradual decline in 
precision at increasing cutoff after some initial fluctuation. 
Performance of the engines as a group Is found to be inferior to 
the group of participants in the TREC-8 Large Web task, although 
the best engines approach the median of those systems. 
Shortcomings of current Web search evaluation methodology are 
identified and recommendations are made for future 
improvements. In particular, the present study and Its 
predecessors deal with queries which are assumed to derive from 
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Abstract 

With the web at close to a billion pages and growing at an exponential rate, we are faced 
with the issue of rating pages in terms of quality and trust. In this situation, what other 
pages say about a web page can be as innportant as what the page says about itself. The 
cumulative knowledge of these types of recommendations (or the lack thereof) can be 
objective enough to help a user or robot program to decide whether or not to pursue a 
web document. In addition, these annotations or metadata can be used by a web robot 
program to derive summary information about web documents that are written in a 
language that the robot does not understand. We use this idea to drive a web Information 
gathering system that forms the core of a topic-specific search engine. 
In this paper, we describe how our system uses metadata about the hyperlinks to guide 
itself to crawl the web. It sifts through useful Information related to a particular topic to 
eliminate the traversal of links that may not be of Interest. Thus, the guided crawling 
system stays focused on the target topic. It builds a rich repository of link information that 
includes metadata. This repository ultimately serves a search engine 
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